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Abstract 

A novel integrative pipeline is presented for discovery of potential cancer-susceptibility regions (PCSRs) by calculating the 
number of altered genes at each chromosomal region, using expression microarray datasets of different human cancers 
(HCs). Our novel approach comprises primarily predicting PCSRs followed by identification of key genes in these regions to 
obtain potential regions harboring new cancer-associated variants. In addition to finding new cancer causal variants, 
another advantage in prediction of such risk regions is simultaneous study of different types of genomic variants in line with 
focusing on specific chromosomal regions. Using this pipeline we extracted numbers of regions with highly altered 
expression levels in cancer condition. Regulatory networks were also constructed for different types of cancers following the 
identification of altered mRNA and microRNAs. Interestingly, results showed that GAPDH, LIFR, ZEB2, mir-21, mir-30a, mir- 
141 and mir-200c, all located at PCSRs, are common altered factors in constructed networks. We found a number of clusters 
of altered mRNAs and miRNAs on predicted PCSRs (e.g.12p13.31) and their common regulators including KLF4 and SOX10. 
Large scale prediction of risk regions based on transcriptome data can open a window in comprehensive study of cancer 
risk factors and the other human diseases. 
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Introduction 

Alteration in mRNAs and miRNAs expression and the 
important role of a large number of these molecules have been 
studied in the initiation, progression and metastasis of many types 
of cancers [1,2,3]_ENREF_1. Changes in DNA methylation and 
transcription factor (TF) regulation, genomic copy number 
variation (CNV) [4], single nucleotide polymorphism (SNP) [5] 
and microsatellite alternation [6] as well as other chromosomal 
aberrations are characterized as major mechanisms of expression 
alternation in different human cancers (HCs). 

Different methods including genome wide association studies 
(GWAS) have identified a large number of associated variants for 
different cancers [7,8,9]. For example, common variants on region 
1 9p 1 3 were found to be associated with ovarian cancer [10], 
CNVs at 6ql3 and five risk loci at 21q21.3, 5pl3.1, 21q22.3, 
22ql3.32 and 10q26.1 1 were directly linked to pancreatic cancer 
[4,1 1]. In addition, new risk loci at 10q25.2, 6q22.2 and 6p21.32 
were associated with lung cancer [12], and several risk loci at 
9q31.2, 19ql 3.4 and 8q24 were shown to be associated with 
prostate cancer [13,14,15]. 

However, challenges in GWAS are finding causal variants and 
functional effects as well as interrelation of these variants in cancer. 
While previous genetic studies of cancer have predicted a large 
number of cancer-associated variants [8,9,10,15,16], identifying 
causal variants is major obstacle, because the known causal genetic 
variants are mostly located within non-coding regions or located at 



various physical distances from the gene they influence [17]. In 
addition, the employed linear modeling framework in GWAS 
often considers only one SNP at a time and ignores the effects of 
the other genotyped SNPs [5]. Therefore, the progression can be 
arduous from statistical association obtained through GWAS to 
inferred causality and functional consequences for cancer. Another 
challenge in large-scale genomics investigations is that some of 
these variants including microsatellites have been less studied 
compared to the other types (SNP and CNV). In addition, many of 
these studies are focused on one type of genomic variations in 
cancer; consequently, the impacts of other involved factors are 
neglected. 

The common procedure employed in previous studies is 
detection of causal variants and searching for functional effects 
of these variants such as association of variants with expression 
quantitative trait loci (eQTLs) [17]. However, there is also a 
reverse strategy comprises prediction of potential cancer-risk 
regions shared across different types of cancers based on 
transcriptome expression data and then searching for causal 
variants. Identification of these regions assists in discovery of new 
variants as well as simultaneous study of different factors affecting 
gene expression by limiting assessments to specific chromosomal 
region. Here, we developed a pipeline which was comprised of 
PCSRs prediction using calculating the transcript-expression 
changes under cancer for each chromosomal region. We also 
extracted common altered mRNAs and microRNAs using 
microarray and expressed sequence tags (ESTs) data following 
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by network analysis to achieve more insights about the predicted 
PCSRs. Using this pipeline, we predicted potential risk regions 
interacting with cluster of targets (mRNAs, miRNAs and/or TFs) 
unravelling potential-candidates for further genome association 
studies. 

Results 

Gene expression data of several types of cancers were 
reanalyzed and the results were combined to predict common 
cancer-risk regions. Another aim of this study was to obtain insight 
into interrelation between PCSRs and altered mRNAs, miRNAs 
and their common regulators. An overview of the workflow is 
shown in Figure 1. 

Results of transcript expression analyses for each cancer dataset 
including breast, colorectal, endometrial, gastric, liver, lung, 
ovarian, pancreatic, prostate, testicular, bladder, intestine neuro- 
endocrine, cervical and renal cancers as well as glioblastoma are 
presented in Table SI. These extracted genes and miRNAs were 
then used for further analysis as outlined below. 




Extract list of differentially altered 




Prediction of potential risk regions 
based on the region frequencies 



Figure 1. Analyzing workflow of prediction of potential risk 
regions. It comprises expression data analysis of different human 
cancers including breast, colorectal, endometrial, gastric, liver, lung, 
ovarian, pancreatic, prostate, testicular, bladder, intestine neuroendo- 
crine, cervical and renal cancers as well as glioblastoma. This primary 
analysis followed by extraction of altered genes, count the chromo- 
somal regions of altered genes and prediction of risk regions based on 
region frequency. 

doi:1 0.1 371 /journal.pone.0096320.g001 



Prediction of Potential Cancer-Susceptibility Regions 
Using Microarray Datasets of Different Cancers 

The percentage of region participation was calculated for each 
chromosome (chr) from microarray data (with 2-fold changes 
threshold) of 1 1 HCs. Details of procedure are described in 
materials and methods. For each chromosome, five regions 
covering the highest frequency of altered genes were recorded as 
potential PCSRs (Table 1). Results showed that among these 
PCSRs, two regions contain the highest number of over-expressed 
genes; chrlp31.2 (27.27%) and chrl3ql3.2 (20.45%) (Table 1, 
Columns 3 to 7). While in the case of down-expressed genes, the 
highest percentage was recorded for regions located at chrl3ql3 
(15.53%) and 4q34.2 (15.15%). 

To test the reliability of the predicted PCSRs, the percentage of 
region participation in cancer was calculated with different 
threshold, where the frequencies of the first 200 probesets with 
highest fold changes were identified for each region (Table S2). 
While, a large number of these regions including lq31.3, 
2p25.2,3q25.2, 12pl3.31 and 22ql2.1 shared in both thresholds 
(Table 1 and Table S2), some regions were recorded as a PCSR 
for only one of these thresholds. For example lp32.2 and 2q22.3 
were identified for the 2-fold changes threshold, whereas, lp22.3 
and 2p 1 2 were recorded for the highest fold changes (Table 1 and 
Table S2). 

Percentage of chromosome participation was also calculated for 
1 1 HCs, to identify which chromosome(s) is more involved in 
transcript expression changes (Table S3). Results showed that chr4 
is harboring the highest number of genes altered in cancer 
(excluding prostate and gastric cancers) (Table S3). In contrast, 
chrY has the lowest number of genes expressed in cancer. A 
summary of chromosomal participation of 1 1 HCs shows 
significant differences as indicated by General Chi-squared test. 
Four top chromosomes harboring the most down-expressed genes 
were chrs 4, 5, 13 and X, whereas in the case of over-expressed 
genes the highest numbers of alteration were recorded for chrs 1 , 
7, 8 and 12 (Figure SI). 

Altered MRNAs Shared across Different Types of Cancers 

Differentially expressed mRNAs with the highest fold changes in 
at least 6 HCs were selected as the common altered mRNAs 
(Table 2 and Table 3). These common altered mRNAs were 
classified into three different expression groups. Class I showed 
over-expression in majority of cancer types such as tubulin alpha 
lb (TUBA IB) and glyceraldehyde-3-phosphate dehydrogenase 
(GAPDH) (Table 2), class II represented down-expression in most 
of HCs such as aspartoacylase (ASPA) and chemokine (C-X-C 
motif) ligand 12 (CXCL12) (Table 2), while the rests (Class III) 
showed a mixed expression patterns in different types of cancers 
such as protein kinase (cAMP-dependent, catalytic) inhibitor beta 
(PKIB) (Table 3). 

Interestingly, a number of common altered mRNAs are located 
on the predicted PCSRs (Column 3 of Table 2 and Table 3). For 
example, GAPDH at 12p 13.31 (as a predicted PCSR) showed 
over-expression in all of HCs (Table2). CKS2 (chr9q22.2), 
CEP55(chrl0q23.33), UHRF1 (chrl9pl3.3), RRM2 (chr2p25.1), 
AURKA (chr20ql3.2), FLJ39632 (chrl4ql 1.2), FAM83D 
(chr20qll.23), NEK2 (chrlq32.3) and MAD2L (chr4q27) were 
all located on PCSRs and showed over-expression in the 9, 8, 10, 
9, 8, 9, 9, 8 and 9 types of cancers, respectively (Table 2 and 
Table 3). In contrast, DCN (chrl2q21.33), LIFR (chr5pl3.1), 
ABCA8 (chrl7q24.2), C7 (chr5pl3.1) and ZEB2 (chr2q22.3) on 
predicted PCSRs were down-expressed in 9, 7, 8, 8 and 8 cancers, 
respectively (Table 2 and Table 3). The rest of altered genes on 
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PCSRs exhibited both down and over-expression patterns 
(Table 3). 

Altered MiRNAs Shared across Different Cancers 

Several types of miRNAs (such as miR-93, mir-182, mir-196b 
and mir- 1274b) exhibited over-expression in majority of cancers 
(Table 3). A number of miRNAs (such as miR-30a and mir-30c-2) 
were down-expressed in various HCs, whereas, many other 
miRNAs exhibited a mixed pattern of expression (Table 4). 

The chromosomal locations were determined for common 
altered miRNAs. Interestingly, miRNAs located on the same 
region showed co-expression in some cancers, such as a cluster at 
19ql3.41 (including mir-99b and -125a). This cluster (19ql3.41) 
was down-expressed in cervical, prostate and renal cancers. In 
contrast, the same cluster was over-expressed in bladder cancer. 
Another co-expressed cluster was observed at 1 2p 13.31 (mir- 
14 land mir-200c), which showed over-expression in ovarian, 
prostate and bladder cancers, and conversely, it were down- 
expressed in renal cancer (Table 4). The rest of co-expressed 
clusters were listed for regions at 6ql3 (including mir-30a and mir- 
30c-2), Xpll.23 (including mir-362, mir-500, mir-501, mir-502 
and mir-532), 14q32.2 (including mir-134, mir-379 and mir-382), 
14q32.31 (including mir-127, mir-432 and mir-770), 9q22.32 
(including let-7d, mir-23b and mir-27b) and 7q22. 1 (including mir- 
93 and mir- 106b) (Table 3). Five out of nine miRNA co-expressed 
clusters listed above are located at predicted PCSRs including 
6ql3, 12pl3.31, 14q32.2, 19ql3.41 and Xq26.2 (Table 4). 

Interaction within and between Common Altered MRNAs 
and MiRNAs Revealed by Network Analysis 

Four separate networks were constructed including a network 
for common altered mRNAs (with 409 entities and 1 288 relations) 
(Figure S2), a network for common altered mRNAs located on the 
different predicted PCSRs (with 383 entities and 1121 relations) 
(Figure S3), a network of common altered miRNAs (with 322 
entities and 1041 relations) (Figure S4) and a network for common 
altered miRNAs located on the different PCSRs (with 123 entities 
and 409 relations) (Figure S5). In addition, a combined network 
was constructed by integration of altered mRNAs and miRNAs 
data, which has 667 entities and 2482 relations (Figure S6). 
Various type of transcription factors, protein kinases, small 
molecules, mRNAs and miRNAs serve as either validated or 
putative regulators in these networks. Additional details of each 
network including number of imported genes and biological 
processes presented in Table S4. 

We identified networks with similar biological processes, such as 
cellular process, biological regulation, metabolic process, multi- 
cellular organismal process, developmental process and response 
to stimulus (Table S4 Column 5). These shared processes imply 
existence of common genes and miRNAs across different 
constructed networks as listed in Table S5. For example, Zinc 
finger E-box binding homeobox 2 (ZEB2), DEAD (Asp-Glu-Ala- 
Asp) box helicase 5 (DDX5) and leukemia inhibitory factor 
receptor alpha (LIFR) were shared between both constructed 
networks of common altered mRNAs and miRNAs (Table S5). 
Among common altered miRNAs, mir-21, mir-30a, mir- 141 and 
mir-200c were shared across all of the four constructed networks 
(Table S5). 

The most frequent subnetwork observed in these networks was 
centered on DDX5 (Figure 2). This subnetwork comprises 5 
entities including DDX5, mir-20b, mir-21, mir-141 and mir-182. 
DDX5 is negatively regulated by mir-20b and mir-141, while 
DDX5 itself regulates mir-21 and mir-182. Down-expression of 
DDX5 was observed in 7 types of HCs, while, mir-20b, mir-21, 



mir-141 and mir-182 over-expressed in 3, 5, 3 and 4 HCs, 
respectively (Table 3 and Table 4). It suggests the negative 
interrelation between DDX5 and these four miRNAs. 

Another subnetwork was constructed based on mir-141, mir- 
200c, and GAPDH, which all located on predicted PCSRs at 
12p 13.31 (Figure 3). This network comprises of 17 entities and 29 
relations (Figure 3). Thirteen downstream targets were observed 
for mir-141, mir-200c, and GAPDH. For example, mir-141 and, 
mir-200c, which were over-expressed in 3 HCs (shown as purple in 
the Figure 3), have miRNA effects on ZEB2 (with down-expression 
in 7 HCs). Interestingly, these altered RNAs including mir-141, 
mir-200c and GAPDH (at 12pl3.31) and also ZEB2 (at 2q22.3) 
are all located at predicted PCSRs. In the case of upstream nodes, 
TP53 and MYC were observed as upstream regulators of mir-200c 
and GAPDH (Figure 3). TP53 is common positive regulator for 
both mir-200c and GAPDH, but MYC is only regulating GPADH 
(Figure 3). 

Promoter Analysis of Altered MRNAs and MiRNAs across 
Different Cancers 

Promoters of over-expressed and down-expressed mRNAs and 
miRNAs were individually analyzed across different cancers. A list 
of common transcription factors for each set of down-expressed 
and over-expressed mRNAs are provided in the Tables S6 and S7, 
respectively. Among 1 8 common predicted TFs for over-expressed 
mRNAs, Kruppel-Like Factor 4 (KLF4) located at PCSRs was 
found to be down-expressed in 7 types of cancers (Table S6). 
While, from total 13 common regulators predicted for down- 
expressed mRNAs, 6 regulators are located on PCSRs. Among 
these 6 regulators RAR-related orphan receptor A (RORA) was 
down-expressed in 8 types of cancers (Except that Glioblastoma 
with over-expression and no significant expression in prostate and 
gastric cancers) (Table S7). 

Common regulators were also predicted for cluster of altered 
miRNAs on the same region (Table S8). For example, GATA2, 
GATA3, ETS1, MZFl_l-4, SOX10, YY1, ZNF354C and SPI1 
were predicted for miRNAs located on cluster at Xpl 1.23 (Table 
S8). In total, 22 common regulators were predicted for different 
clusters of miRNAs which eight of them are located at PCSRs 
including YY1, SPIB, SOX10, NFIC, NR4A2, FOXD1, 
NFATC2 and HOXA5 (Table S9). Interestingly, GATA2 was 
predicted for both down-expressed mRNAs and altered miRNAs. 

Discussion 

An effective pipeline was developed to predict PCSRs using 
microarray datasets of different cancer studies. Two different 
thresholds were applied to predict PCSRs including probsets with 
at least 2-fold changes and first 200 probsets with the highest fold 
changes. Most of the predicted PCSRs on each chromosome were 
similar in both applied thresholds, which confirm the reliability of 
these PCSRs. 

In addition to this confirmation, based on literature review we 
found the presence of several important cancer-associated variants 
on our predicted PCSRs. These variants have been reported 
previously for pancreatic [4,11] (6ql3, 21q21.3, 5pl3.1, 21q22.3 
and 22ql3.32), lung [12] (6p21.32), prostate [13,14,15] (9q31.2, 
19ql3.4, 8q24 and 17q21-q22), ovarian [10] (19pl3), breast [18] 
(8q24, 12pl3 and 20ql3) and colorectal cancer [19] (1 lq23, 8q24 
and 18q21). Our findings in agreement with these studies 
identified region 8q24 as a risk region in variety of HCs 
[8,14,19,20,21], which shows involvement of some of risk regions 
in several types of cancers rather than a specific cancer. Moreover, 
some of the predicted PCSRs in this study were reported in other 
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types of human diseases including herpes simplex virus type 1 [22] 
(21q), polycystic ovary syndrome [23] (9q33.3), Type 1 diabetes 
and Rheumatoid arthritis [24] (both located on 18pll). This 
similarity might indicate the efficiency of our approach in 
prediction the risk regions associated with different human diseases 
besides cancer. 

We also found that eight chromosomes harbor the most altered 
genes in different types of cancer including chromosomes 1, 4, 5, 7, 
8, 12, 13 and X. Interestingly, chromosomes 1, 4 and 13 were also 
recorded as the chromosomes with the highest percentage of 
predicted PCSRs, which suggests the important role of these 
chromosomes in cancer biology. Based on these results and those 
previously reported on chromosomes abnormality [7,25,26,27], it 
can be concluded that our pipeline is able to predict risk regions as 
well as risk chromosomes in a variety of diseases including cancer. 
This pipeline can also be applied to the fast growing (but still 
limited number of) RNA-seq datasets in future studies. 

Network analysis indicates that DDX5, LIFR, ZEB2, mir-21, 
mir-27b, mir-30a, mir-141, mir-182 and mir-200c were shared 
across different constructed networks, indicting their crucial role in 
cancer biology and progression, which has been reported 
previously [28,29,30]. For example, the potential clinical utility 
of DDX5 and its associated miRNAs (mir-21 and mir-182) are 
suggested as therapeutic target in breast cancer [29,31]. In 
addition, clinical application of different miRNAs in cancer such 
as let-7, mir-21and mir-122 are discussed in recent study of Nana- 
Sinkam and Groce [28]. 

Because miRNAs do not function in isolation [28], we analyzed 
the cluster of miRNAs on same regions to understand the relative 
contribution of multiple miRNAs rather than individual miRNA. 
Co-expression of different miRNA implies the presence of 
common transcription regulators and/ or common causal variants 
for these regions. It is also previously reported that common 
modules on the promoters can cause co-expression of the genes 
[32]. 

We found that different common regulators for altered mRNAs 
and miRNAs including, KLF4 (at 9q31.2) and RORA (15q22.2) 
were on the predicted PCSRs. These two TFs mediate a set of cell- 
cycle genes and exhibits both oncogenic and tumor suppressive 
functions [33,34]. Interestingly, down-expression of mir-30c-2 (at 
6ql3) as well as over-expression of GATA3 was observed across 
different types of HCs in this study, which confirm regulation of 
mir-30c-2 through GATA3. Bockhorn and collogues recendy 
demonstrated that mir-30c is transcriptionally regulated with 
GAT A3 [35]. 

Presence of another level of interrelation between cancer-risk 
regions was suggested, where mRNAs and their common 
regulators at different PCSRs interact with each other as well as 
their targets. The subnetwork centered on DDX5 with total 5 
nodes and 4 relations (Figure 2) and the subnetwork of GAPDH, 
miR-141 and mir-200c confirm such interactions (Figure 3). In 
these subnetworks, different RNAs are located on PCSRs 
including GAPDH, ZEB2, mir-20b, mir-21, mir-141 and mir- 
200c supporting the important effects of these RNAs and their 
regions in cancer. 

Subnetwork centered on DDX5 is shared across networks 
constructed for altered mRNAs and miRNAs in different cancers. 
RNA helicase DDX5 (also known as p68) is involved in RNA 
metabolism and serves as a transcriptional co-regulator and has 
been reported as regulator of mir-182 in breast cancer [29]. 
Significant association has been also reported between DDX5 
rs 199 1401 (OP = 7.90x10 — 5) and malignant peripheral nerve 
JL '5 sheath tumor [36] . Our results showed that up regulation of mir- 

20b and mir-141 down regulates DDX5. 
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Figure 2. Subnetwork center on DDX5 derived from network of common altered variants in different cancers. Network is including 
mir-21, mir-182, -mir20b and mir-141. Network was constructed using pathway studio 9 software. Network was assembled based on bioinformatics 
and literature, combined with biological interpretation of the microarray data and enriched Gene Ontology functional groups. Red: over-regulated 
entities in most of cancers. Blue: down-regulated entities in most of cancers, ^represents negative-regulated. 
doi:1 0.1 371 /journal.pone.0096320.g002 



Second subnetwork (Figure 3) contained GAPDH, mir- 1 4 1 and 
mir-200c that are located at 1 2p 13.31 as predicted PCSRs. 
Amplification of 1 2p 1 3 region was observed in breast cancer [37], 
T cell lymphomas and lymphocytic leukemia [38,39], causing 
over-expression of GAPDH, mir-141 and -200c. Upstream 
regulators can involve in up-regulation of these RNAs and a 
positive effect has been reported for TP53 located on the upstream 
region of GAPDH [40]. In addition, Yoshihara et al [41] reported 
some sporadic ovarian cancer-unique CNVs at 12p 13.31. In 
general, these reports in combination with our in siEco findings 
indicate the crucial role of 1 2p 13.31 in HCs. 



Interestingly, some other common RNAs between cancers in 
this report, are observed in prior studies of tumors and other 
diseases [16,42]. For example, presence of synonymous SNP 
(rs 129482 17) affecting the exonic splicing enhancers site nearby 
ASPA has been reported for neurodegenerative disease [43] . Loss 
of regions including 14q32.2 (location of mir-127, mir-432 and 
mir-770) and 14q32.31 (mir-134, mir-379, and mir-382) were 
reported in previous studies of renal cancer and osteosarcoma 
[16,44]. In our study, mirRNAs located at 14q32.2 and 14q32.31 
showed down-expression in several cancers, implying down- 
expression of miRNAs following chromosome loss in these regions. 




Figure 3. Network of common altered variants in different cancers including mir-200c, mir-141, and GAPDH at 1 2p1 3.3. Network was 
constructed using pathway studio 9 software. Shortest path algorithm was applied to construct network. Network was assembled based on 
bioinformatics and literature, combined with biological interpretation of the microarray data and enriched Gene Ontology functional groups. Purple: 
over-regulated entities in most of cancers Blue: down-regulated entities in most of cancers. O-vertex represent TFs, ©represents positive-regulated, 
and ^represents negative-regulated. 
doi:1 0.1 371 /journal.pone.0096320.g003 
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In conclusion, predicted PGSRs in the current study opens new 
avenue in further genome association studies for finding different 
types of cancer-causal variants. Since multiple variations accu- 
mulated in a gene or a cluster of genes may all contribute to the 
phenotype, studying different types of variations or regulatory 
mechanisms over a gene, cluster of genes or specific region might 
be a useful tool for improving association detection. The 
identified common altered RNAs at PCSRs in our constructed 
networks have great potential to be used for finding associated 
SNPs, CNVs and/or SSRs near these genes. In addition, these 
results suggest the potential of novel regulator-based (rather than 
gene-based) cancer therapy in order to restore the disrupted 
cluster of mRNAs and/ or miRNAs. In general, our pipeline can 
be effectively used to predict cancer-risk regions and cancer-risk 
chromosomes. 

Methods 

Expression Data Analysis 

Raw CEL expression data for different HCs were obtained from 
Gene Expression Omnibus (GEO) database (Table S10). The 
RMA (Robust Multichip Average) algorithm was first applied to 
the microarray raw data to obtain normalized data using 
Expression Console software (Affymetrix, CA, USA). Data were 
then analyzed using FlexArray software (http://genomequebec. 
mcgill.ca/FlexArray/). Differential gene expression pattern for 
each experiment (cancer vs. normal) was evaluated using empirical 
Bayes test (a moderated t test) (p<0.05). Genes exhibiting at least 
2-fold changes in gene expression and 1.5 fold changes in miRNA 
expression were selected for further analysis. Also, 1 .2-fold change 
was considered to trace common altered mRNAs and miRNAs in 
different cancers. 

The digital differential display (DDD) tool (http://www.ncbi. 
nlm.nih.gov/UniGene/ ddd.cgi) was used to screen the cancer- 
related genes in different HCs. EST libraries selected for DDD 
comparisons of different tissues (cancer vs. normal) are listed in 
Table S 1 1 . Pools A and B were assigned for normal and cancerous 
libraries in each cancer, respectively. The output provided a 
numerical value in each pool denoting the fraction of sequences 
within the pool that mapped to the UniGene cluster. Statistically 
significant hits (Fisher's exact test) showing > 10-fold differences 
were compiled, and a preliminary database was created. Fold 
differences were calculated by using the ratio of pool B/ pool A, 
according to previously described method [45]. 

Among probsets with highest fold changes, common altered 
mRNAs and miRNAs (at least in 6 out of 1 1 HCs) were extracted 
using DDD tools together with microarray datasets. These 
common altered RNAs afterward used for network constructions. 

Detecting of Shared-Cancer Susceptibility Regions 

The numbers of differentially expressed genes were counted for 
each region (as frequency of the region) using an in-house 
developed python script (The python script is available in Script 
SI). The frequency of region involved in expression was calculated 
for probsets with at least 2-symmetrical fold changes (Table SI 2) 
and 200 first probsets with the highest fold changes (Table SI 3). 
Next for each region, percentage of region participation in 
differentially expressed probsets in all 1 1 types of HCs was 
calculated using following equations: 

Region participation for over — expressed probsets(%) = 

(FOR/(FTPxn))x 100 



Where FOR is the frequency of region for over-expressed probsets 
(summation of 1 1 HCs), n is the number of cancers (here is 1 1) and 
FTP is frequency of region for total probsets (Table S14 and SI 5). 

Region participation for down — expressed probsets(%) = 

(FDR/(FTPxn))x 100 

Where FDR is the frequency of region for down-expressed 
probsets (summation of 1 1 HCs), n is the number of cancers (here 
is 11) and FTP is the frequency of region for total probsets (Table 
S14 and SI 5). Finally, five regions with the highest ratio were 
selected as potential cancer- risk regions for each chromosome. 

In addition, percentage of chromosome participation in 
differentially expressed probsets in total 1 1 HCs was calculated 
using following equations: 

Chromosome participation for over - expressed probsets(%) = 

(FOC/(FCTPxn))xl00 

Where FOC is the frequency of chromosome for over-expressed 
probsets (summation of 1 1 HCs), n is the number of cancers (here is 
1 1) and FCTP is the frequency of chromosome for total probsets 
(Table SI 6). 

Chromosome participation for down - expressed probsets(%) = 

(FDC/(FCTPxn))x 100 

Where FDC is the frequency of chromosome for down-expressed 
(summation of 1 1 HCs), n is number of cancers (here is 1 1) and 
FCTP is the frequency of chromosome for total probsets (Table 
SI 6). Moreover, the percentages of chromosome participation for 
each cancer (Table SI 7) were calculated using fraction of 
chromosome frequency for altered probsets to chromosome 
frequency for total probsets (Table SI 7). The differences of 
chromosomes were investigated based on general chi square test. 

Construction of Networks on Common Altered MRNAs 
and MiRNAs 

Pathway Studio 9 software (Ariadne Genomics, Rockville, MD) 
was used to construct different networks. Pathway Studio uses the 
RESNET Mammal database, which is a comprehensive pathway 
and molecular interaction database [46]. This database includes 
new aliases for human genes, miRNAs and entries from other 
mammals. The shortest path algorithm was used to construct four 
different networks based on altered mRNAs and miRNAs [47]. 
Five networks were constructed based on common altered RNAs, 
including network of commonly altered mRNAs, network of 
commonly altered mRNAs on PCSRs, network of commonly 
altered miRNAs, network of commonly altered miRNAs on 
PCSRs and integrative network of common altered mRNAs and 
miRNAs. The biological process of each network was identified 
using the DAVID (http://david.abcc.ncifcrf.gov/tools.jsp) suite of 
bioinformatics tools. DAVID bioinformatics resources consists of 
an integrated biological knowledgebase and analytic tools aimed at 
systematically extracting biological meaning from large gene/ 
protein lists [48]. 

Promoter Analysis of Altered RNAs 

Promoter analysis was conducted for co-expressed mRNAs 
across different cancers using pscan[49]. Transcription factors 
(TFs) were predicted in the promoter regions (— 1 kb to 0) of 
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mRNAs using Jaspar database (TFs with P-value<0.1 were 
selected). In the case of miRNAs, common regulators were 
predicted for altered miRNAs at same region using Jaspar web 
tool (http://jaspar.genereg.net/). TFs were predicted in the 
putative promoter regions (—3 kb to +1 kb) of microRNAs with 
at least 99% relative profile score threshold. Expression of 
predicted TFs was determined using transcript-microarray ex- 
pression data of 1 1 different cancers including breast, colorectal, 
endometrial, gastric, liver, lung, ovarian, pancreatic, prostate, 
testicular, bladder, intestine neuroendocrine, cervical and renal 
cancers as well as glioblastoma. 
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